114 research outputs found

    Revisiting Precision and Recall Definition for Generative Model Evaluation

    Full text link
    In this article we revisit the definition of Precision-Recall (PR) curves for generative models proposed by Sajjadi et al. (arXiv:1806.00035). Rather than providing a scalar for generative quality, PR curves distinguish mode-collapse (poor recall) and bad quality (poor precision). We first generalize their formulation to arbitrary measures, hence removing any restriction to finite support. We also expose a bridge between PR curves and type I and type II error rates of likelihood ratio classifiers on the task of discriminating between samples of the two distributions. Building upon this new perspective, we propose a novel algorithm to approximate precision-recall curves, that shares some interesting methodological properties with the hypothesis testing technique from Lopez-Paz et al (arXiv:1610.06545). We demonstrate the interest of the proposed formulation over the original approach on controlled multi-modal datasets.Comment: ICML 201

    A Fast Multi-Layer Approximation to Semi-Discrete Optimal Transport

    Get PDF
    International audienceThe optimal transport (OT) framework has been largely used in inverse imaging and computer vision problems, as an interesting way to incorporate statistical constraints or priors. In recent years, OT has also been used in machine learning, mostly as a metric to compare probability distributions. This work addresses the semi-discrete OT problem where a continuous source distribution is matched to a discrete target distribution. We introduce a fast stochastic algorithm to approximate such a semi-discrete OT problem using a hierarchical multi-layer transport plan. This method allows for tractable computation in high-dimensional case and for large point-clouds, both during training and synthesis time. Experiments demonstrate its numerical advantage over multi-scale (or multi-level) methods. Applications to fast exemplar-based texture synthesis based on patch matching with two layers, also show stunning improvements over previous single layer approaches. This shallow model achieves comparable results with state-of-the-art deep learning methods, while being very compact, faster to train, and using a single image during training instead of a large dataset

    Local matching indicators for transport problems with concave costs

    Full text link
    In this paper, we introduce a class of indicators that enable to compute efficiently optimal transport plans associated to arbitrary distributions of N demands and M supplies in R in the case where the cost function is concave. The computational cost of these indicators is small and independent of N. A hierarchical use of them enables to obtain an efficient algorithm

    Generating Private Data Surrogates for Vision Related Tasks

    Get PDF
    International audienceWith the widespread application of deep networks in industry, membership inference attacks, i.e. the ability to discern training data from a model, become more and more problematic for data privacy. Recent work suggests that generative networks may be robust against membership attacks. In this work, we build on this observation, offering a general-purpose solution to the membership privacy problem. As the primary contribution, we demonstrate how to construct surrogate datasets, using images from GAN generators, labelled with a classifier trained on the private dataset. Next, we show this surrogate data can further be used for a variety of downstream tasks (here classification and regression), while being resistant to membership attacks. We study a variety of different GANs proposed in the literature, concluding that higher quality GANs result in better surrogate data with respect to the task at hand

    On the Theoretical Equivalence of Several Trade-Off Curves Assessing Statistical Proximity

    Full text link
    The recent advent of powerful generative models has triggered the renewed development of quantitative measures to assess the proximity of two probability distributions. As the scalar Frechet inception distance remains popular, several methods have explored computing entire curves, which reveal the trade-off between the fidelity and variability of the first distribution with respect to the second one. Several of such variants have been proposed independently and while intuitively similar, their relationship has not yet been made explicit. In an effort to make the emerging picture of generative evaluation more clear, we propose a unification of four curves known respectively as: the precision-recall (PR) curve, the Lorenz curve, the receiver operating characteristic (ROC) curve and a special case of R\'enyi divergence frontiers. In addition, we discuss possible links between PR / Lorenz curves with the derivation of domain adaptation bounds.Comment: 10 pages, 3 figure

    Detecting Overfitting of Deep Generative Networks via Latent Recovery

    Full text link
    State of the art deep generative networks are capable of producing images with such incredible realism that they can be suspected of memorizing training images. It is why it is not uncommon to include visualizations of training set nearest neighbors, to suggest generated images are not simply memorized. We demonstrate this is not sufficient and motivates the need to study memorization/overfitting of deep generators with more scrutiny. This paper addresses this question by i) showing how simple losses are highly effective at reconstructing images for deep generators ii) analyzing the statistics of reconstruction errors when reconstructing training and validation images, which is the standard way to analyze overfitting in machine learning. Using this methodology, this paper shows that overfitting is not detectable in the pure GAN models proposed in the literature, in contrast with those using hybrid adversarial losses, which are amongst the most widely applied generative methods. The paper also shows that standard GAN evaluation metrics fail to capture memorization for some deep generators. Finally, the paper also shows how off-the-shelf GAN generators can be successfully applied to face inpainting and face super-resolution using the proposed reconstruction method, without hybrid adversarial losses

    Co-segmentation non-supervisée d'images utilisant les distances de Sinkhorn

    Get PDF
    National audienceIn this work, a convex and robust formulation of the unsupervised co-segmentation problem is introduced for pair of images. The proposed model relies on the optimal transport theory to asset the statistical similarity of the segmented regions’ features (color histograms in this work). The optimal transport cost is approximated by Sinkhorn distance to reduce the optimization complexity. A primal-dual algorithm is used to solve the problem efficiently, without making use of sub-iterative routines.Nous proposons une formulation convexe et robuste du problème de co-segmentation non supervisée de paire d'images. Ce modèle définit l'adéquation statistique des régions segmentées dans le cadre du transport optimal, en mesurant le coût de transport entre les histogrammes de descripteurs (ici la couleur). Afin de réduire la complexité de mise en oeuvre de ce modèle, les coûts de transport optimaux sont approchés par les distances de Sinkhorn, qui sont formulées comme la régularisation entropique du transport optimal. Un algorithme itératif exploitant la formulation primale-duale du problème est utilisé pour résoudre le problème de manière efficace et exacte

    Mise en correspondance de descripteurs géométriques locaux par méthode a contrario

    Get PDF
    De nombreuses applications en analyse d'images s'appuient sur une représentation par des descripteurs locaux tels que les SIFT [3]. La mise en correspondance de ces descripteurs, bien que cruciale, est le plus souvent réduite à un seuillage sur la distance au plus proche voisin. Dans cette contribution, une nouvelle mesure de dissimilarité robuste à la quantification des descripteurs est proposée. Nous présentons ensuite un critère de mise en correspondance, inspiré des méthodes « a contrario » [1], qui permet d'évaluer le degré de significativité des appariements testés et fournit des seuils de validation qui s'adaptent automatiquement à la complexité et à la diversité des données

    Methods to Improve Bulk Lifetime in n-Type Czochralski-Grown Upgraded Metallurgical-Grade Silicon Wafers

    Get PDF
    This paper investigates the potential of three different methods-tabula rasa (TR), phosphorus diffusion gettering (PDG), and hydrogenation, for improving the carrier lifetime in n-type Czochralski-grown upgraded metallurgical-grade (UMG) silicon samples. Our results show that the lifetimes in the UMG wafers used in this study were affected by both mobile metallic impurities and as-grown oxygen precipitate nuclei. Thus, the dissolution of grown-in oxygen precipitate nuclei via TR and the removal of mobile impurities via PDG step were found to significantly improve the electronic quality of the UMG wafers. Finally, we report bulk lifetimes and 1-sun implied open-circuit voltages of the UMG wafers after boron and phosphorus diffusions, as typically applied in n-type cell fabrication.This work has been supported by the Australian Renewable Energy Agency (ARENA) through research grant RND009.
    • …
    corecore